Parent Assignment Is Hard for the MDL, AIC, and NML Costs

نویسنده

Mikko Koivisto

چکیده

Several hardness results are presented for the parent assignment problem: Given m observations of n attributes x1, . . . , xn, find the best parents for xn, that is, a subset of the preceding attributes so as to minimize a fixed cost function. This attribute or feature selection task plays an important role, e.g., in structure learning in Bayesian networks, yet little is known about its computational complexity. In this paper we prove that, under the commonly adopted full-multinomial likelihood model, the MDL, BIC, or AIC cost cannot be approximated in polynomial time to a ratio less than 2 unless there exists a polynomial-time algorithm for determining whether a directed graph with n nodes has a dominating set of size log n, a LOGSNP-complete problem for which no polynomial-time algorithm is known; as we also show, it is unlikely that these penalized maximum likelihood costs can be approximated to within any constant ratio. For the NML (normalized maximum likelihood) cost we prove an NP-completeness result. These results both justify the application of existing methods and motivate research on heuristic and super-polynomial-time algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering Change Detection Using Normalized Maximum Likelihood Coding

We are concerned with the issue of detecting changes of clustering structures from multivariate time series. From the viewpoint of the minimum description length (MDL) principle, we introduce an algorithm that tracks changes of clustering structures so that the sum of the code-length for data and that for clustering changes is minimum. Here we employ a Gaussian mixture model (GMM) as representa...

متن کامل

Scoring functions for learning Bayesian networks

The aim of this work is to benchmark scoring functions used by Bayesian network learning algorithms in the context of classification. We considered both information-theoretic scores, such as LL, AIC, BIC/MDL, NML and MIT, and Bayesian scores, such as K2, BD, BDe and BDeu. We tested the scores in a classification task by learning the optimal TAN classifier with benchmark datasets. We conclude th...

متن کامل

Revisiting enumerative two-part crude MDL for Bernoulli and multinomial distributions (Extended version)

We exploit the Minimum Description Length (MDL) principle as a model selection technique for Bernoulli distributions and compare several types of MDL codes. We first present a simplistic crude two-part MDL code and a Normalized Maximum Likelihood (NML) code. We then focus on the enumerative two-part crude MDL code, suggest a Bayesian interpretation for finite size data samples, and exhibit a st...

متن کامل

An Empirical Study of MDL Model Selection with Infinite Parametric Complexity

Parametric complexity is a central concept in MDL model selection. In practice it often turns out to be infinite, even for quite simple models such as the Poisson and Geometric families. In such cases, MDL model selection as based on NML and Bayesian inference based on Jeffreys’ prior can not be used. Several ways to resolve this problem have been proposed. We conduct experiments to compare and...

متن کامل

NML Computation Algorithms for Tree-Structured Multinomial Bayesian Networks

Typical problems in bioinformatics involve large discrete datasets. Therefore, in order to apply statistical methods in such domains, it is important to develop efficient algorithms suitable for discrete data. The minimum description length (MDL) principle is a theoretically well-founded, general framework for performing statistical inference. The mathematical formalization of MDL is based on t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Parent Assignment Is Hard for the MDL, AIC, and NML Costs

نویسنده

چکیده

منابع مشابه

Clustering Change Detection Using Normalized Maximum Likelihood Coding

Scoring functions for learning Bayesian networks

Revisiting enumerative two-part crude MDL for Bernoulli and multinomial distributions (Extended version)

An Empirical Study of MDL Model Selection with Infinite Parametric Complexity

NML Computation Algorithms for Tree-Structured Multinomial Bayesian Networks

عنوان ژورنال:

اشتراک گذاری